min rank | avg. rank | sentence |
---|---|---|
11540 | 23829.5714 | ! salom janobi Saadi fikr mekardam islomkhohi ammo.. |
8700 | 22990.0000 | Ozodi Iltimos chop kuned, tamomi konunhoi suhanroniro rioya kardam. |
7765 | 24764.0000 | Hatman yagon opravdanie doshta boshand darkor.. |
7765 | 24440.5000 | Mebakhshed, chili shahidonro naguzaronda mardumi Tojik hej goh yagon jashnro barpo namekunand. |
7763 | 25575.3333 | Besavod. hamai mardumi tojik zaboni modarii khudsha namedona hatto gapzada nametona Zaboni Tojikira poymol kadem. |
7750 | 26398.0769 | Hub soda bdai nimi Tojikiston hudshona alov sar btan janobi olit darak namiyova. |
7600 | 12873.3333 | Тарафдори макола хастам. |
6377 | 17007.8000 | Турмахо пури генерали чинояткор шудаст. |
6123 | 11229.5000 | Монед илтимос кордор нашавед. |
5940 | 21770.5714 | Gapi shumo rost: shumo akli khurdakak dored. |
5780 | 31769.1667 | Дили шабзинидадоре дора(д) ошиқ, Ғуруре, ифтихоре дора(д) ошиқ, Ҷавонӣ, ошиқӣ, ҳиҷрону дидор.. |
5494 | 21233.8462 | Ammo yak khohish agar mushikil naboshad chunin makoloho ziyodtar peshnihod kuned khub meshud. |
5185 | 33082.5000 | Хукумати чумхури кораша тезонад. |
4795 | 21854.9000 | Mardumi nopisand, gufti hudashon mo hama vaqt "mladshie bratya" hastem. |
4485 | 21498.8000 | Padaru modari bofarhang hamesha kodir ast farzandi bomani tarbiya kunad. |
4485 | 20533.6667 | Yaane komandir Tolib raisi Badakhshon ast yo hukumati mahali? |
4277 | 24210.5714 | Натарс чура - бра ай паи корат шав! |
4015 | 20784.0625 | Vale imruz on dukhtarhoi zeboi tojik husni zeboi khudro Bo pushidani libosi avrupoi gum karda istodaand. |
4014 | 22038.4444 | Komentat boz ham haq budani sukhanhoi Richardro isbot kard. |
4014 | 22753.7778 | Shoyad javonamu hanuz dustoni vokean ham dustam atrofam hastand. |
3817 | 22828.2143 | Kudak boyad barnoma doshtaboshad, digar 24soat lazzati kudakiroki burd pas fardo vai chi meshavad. |
3733 | 17217.6000 | Инхо наход кадри одамро нафахманд. |
3553 | 22638.6667 | Вайронкунии коидаи харакати рох Салом алейкум. |
3300 | 6027.7500 | Намедонам доварони Озоди кист? |
3015 | 18071.3636 | Bar sari yak shahri maydayak, 3000 spets naz chi lozim bud? |
2798 | 27868.2500 | Истед, истед, оҳистатар, дурусттар фаҳмонед, вагарна чизеро нафаҳмидам. |
2794 | 29078.7000 | Гиречкада, биё, бача, пиЮзда биё, нУхУд арзон шид, ктшкаи беҳтарин.. |
2767 | 31936.6667 | Неки озодагон ами шабнама кати фарзона факат мекуни нигинахона нависетон камкам. |
2740 | 29999.3333 | Касе, км намоз мехонад,худоро мешиносад,одаму чамьичтк муьмину бародарро мешиносад. |
2609 | 18404.6667 | У хамеша саргарми парвандахои сиёси пуд. |
In contrast to subsection 4.5.2.1 we now search for sentences consisting of rare words only. The sentences are ordered by the rank of the most frequent word in a sentence. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The sentences are forced not to contain any everyday word. As a consequence, we get either sentences of some very reduced structure or sentences in some foreign language. Hence, the data are useful for the evaluation of the preprocessing, especially language detection.
select min(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m desc limit 30;
Should we remove the sentences having its least frequent word above some threshold?
4.5.2.1 Maximum word rank in sentence
4.5.2.2 Average word rank in sentence
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II